ST1C-ILL yQ{ /Is^ 

From: Davis, Natalie 

Sent: Tuesday, January 22, 2002 1:10 PM 

To: STIC-ILL 

Subject: 09/823101 



Please send the following: 

1. Kato, et al., Nucleic Acids Res 1983 Dec 10;] 1 (23) : 8197-203 



Meyers, et aL J Biol Chem 1983 Aug 25:258(16)1^0125^35^^ 



Natalie A.Davis, PhD 

Patent Examiner 
Art Unit 1642 
CM1, Rm 8B13 
Mailbox 8E12 



Ph (703) 308-6410 





1 



01/24/ 2002 14:13 FAX * PATENT&TRADEMARK Q 003/012 

r 

J -I 

AUGUST 25, 1983 VOLUME 258 NUMBER 16 
: HEALTH SCIENCES LIBRARY 

ISSN 0021-9256 

#?;' UniVi?rS:5y Of Wisconsin JBCH A3 258(16) 9581-10192 (1963] 

1 o05 l.iprt-^^ ~- Mr::::s.3r:. Wis. 5370G 

AUG a 1383 1 



THE 




Biological 
Chemistry 



Published by The American Society of Biological Chemists, Inc. 

FOUNDED BY CHRISTIAN A Hf RTFR 



AMD 5 U 5 T A I N t L") 



IN PARI BY THE CHRISTIAN A hUKTER MEMORIAL FU^D 



01/24/2002 14:14 FAX 



-* PATENT&TRADEMARK i) 004/0 



12 



Vol. 258, No. 16 



The Journal of 

Biological Chemistry 

Copyright © 1983 by the Ameiicnn Society of Biological Chemists, Inc., 428 Eaat Proton St., Baltimore, MdL 21202 U.S.A. 



August 25, 1983 



CONTENTS* 



COMMUNICATIONS 



9581 Somotoznedin-C stimulated the phosphorylation of 
the 0-subunit of its own receptor. ' 

Steven Jacobs, Frederick C. Kull, Jr., H. Shelton Eorp, 
Marjorie E. Suaboda, Judson J. Van Wyk, and Pedro Cua- 
trecasas 

9585 Genetic polymorphisms for a phenobarbital-induci- 
ble cytochrome P-450 map to the Goh locus in mice. 
Daniel L. Simmons and Charles B, Kasper 

9589 Protease-activated kinase II as the potential media- 
tor of insulin-stimulated phosphorylation of ribo- 
somal protein S6. 

Olga PerLsic and Jolinda A. Traugh 

9593 Expression of human Chinese hamster hypoxanthine- 
guanine phosph or ibosyl transferase cDNA recombi- 
nants in cultured Lesch-Nyhan and Chinese hamster 
fibroblasts. 

John Brennand, David S. Konecki, and C. Thomas Caskey 

9597 Degradation of microinjected methylated and un- 
methylated proteins in hepatoma tissue culture cells. 

Reuital Katznelson and Richard G. Kulka 

9601 Seasonal variations in different forms of pokeweed 
antiviral protein, a potent inactivator of ribosomes. 

L. L. Houston, S. Ramakrishnan, and Mark A. Hermodson 

9605 Insulin receptor down regulation in human erythro- 
cytes. ; 

Scott W. Peterson, Amy L. Miller, Robin S. Kelleher, and 
Edward F. Murray 

9603 Sites of methyl esterification on the aspartate recep- 
tor involved in bacterial chemot axils. 

Thomas C. Terwilliger, Elena Bogonezl Elizabeth A. Wang, 
and Daniel £. Koshland, Jr. 

9612 The structure of two blood group A-active glyco- 
sphingolipids with 12 sugars and a branched chain 
present in the epithelial cells of rat Ismail intestine. 
Gunnar C Hansson 

9616 Leuhotriene C« binding to rat lung membranes. 

Sheng-Shung Pong, Robert N. DeHaven, Frederick A. 
Kuehl Jr., and Robert W. Egan j 

9620 Biochemical effects of dipyridamole on purine over- 
production and excretion by mutant murine T- 
lymphoblasts. 

Buddy Ullman and Kiran. Kaur 

9623 Acylation of CDP-monoacylglycerol cannot be con- 
firmed. 

William Thompson and Richard T. Zuh 

i 

9624 Endogenous phosphates on liver glycogen synthase D 
and synthase I. Studies on the number and location. 

Agnes W. H Tan and Frank Q. Nuttall 

9631 Activation of mouse peritoneal macrophages by lip- 
©polysaccharide alters the kinetic parameters of the 
superoxide-producing NADPH oxidase ► 

MasatakaSasada, Michael J. Pabst, arid Richard B. John- 
ston, Jr. 

9636 Binding of cAMP derivatives to Dictyostelium dia- 
coideum cells. Activation mechanism of the cell sur- 
face cAMP receptor. 

Peter J. M. Van Haastert and Erik Kien 

9643 Binding of cAMP and adenosine derivatives to Dic- 
tyostelium diecoideum cells. Relationships of bind- 
ing, chemo tactic , and antagonistic activities. 

Peter J, M. Van Haastert I 



The 3'-5' proofreading exonuclease of bacteriophage 
T4 DNA polymerase is stimulated by other T4 DNA 
replication proteins. 

Patricia Bedinger and Bruce M. Alberts 

Protonic inhibition of the mitochondrial oligomycin- 
sensitive adenosine 5 '-triphosphatase in ischemic 
and aotoiyzing cardiac muscle. Possible mechanism 
for the mitigation of ATP hydrolysis under nonener- 
gizing conditions. 
Will Ham Rouslin 

Kinetic studies of calcium release from sarcoplasmic 
reticulum in vitro. 

Do Han Kim, S. Tsuyaahi Ohnishi, and Noriaki Ikemoto 

Isolation and functional characterization of the ac- 
tive light chain of activated human blood coagulation 
factor XL 

Fedde uan der Graaf, Judith S. Greengard, Bonna N. 
Bourna, Daniels M. Kerbiriou* and John H. Griffin 

The transferrin cycle and iron uptake in rabbit retic- 
ulocytes. Pulse studies using 5W Fe, 12B I-Iabeled trans- 
ferrin. 

Marco-Tulio Nunez and Jonathan Glass 

Kinetics of internalization and recycling of transfer- 
rin and the transferrin receptor in a human hepatoma 
cell line. Effect of lysosomotropic agents. 

Aaron Cicchanover, Alan L Schwartz, Alice Dautry-Var- 
Rat, and Harvey F. Lodish 

Sugar transport by the bacterial phosphotransferase 
system* Preparation of a fluorescein derivative of the 
glucose-specific phosphocarrier protein m*^ and its 
binding to the phosphocarrier protein HPr. 

Edward G. Jablonski Ludwig Brand, and Saul Roseman 

The reaction of 8-mercapto flavins and flavoproteins 
with sulfite. Evidence for the role of an active site 
arginine in D- amino acid oxidase. 
Paul F. Fitzpatrick and Vincent Massey 

Molecular weight of the functional unit of human 
leukocyte, fibroblast, and immune interferons. 

Sidney Pestka, Bruce Kelder, Philip C Familletti John 
A- Moschera, Robert Crowk and EUis 5. Kempner 

Mixed type inhibition of the renal Na*/H + antiporter 
by Li + and anuloride. Evidence for a modifier site* 
Harlan E. Ives, Victoria J. Yee t and Dauid G. Warnock 

Identification by direct photoaffinity labeling of an 
altered phosphodiesterase in a mutant S49 lymphoma 
cell. 

Vincent E. Groppi, Florence Steinberg, Harvey R. Kaslow, 
Naomi Walker, and Henry JL Bourne 

ATP activation of parathyroid hormone cleavage cat- 
alyzed by cathepsin D from bovine kidney. 

Sreekumar PUlai, Robert Botti, Jr., and James E. Zull 

Reaction of dATP with A^methyl-JV- nitrosourea in 
vitro. 

Mary S. Baker and Michnel D. Topal 

Purification and properties of a pantetheine-bydro- 
lyzing en2yme from pig kidney. 

Carl T. Wittwer, Dave Burkhard, Kirk Ririe, Randy fletf- 
mussen r Jack Brown, Bonita W. Wyse, and R Gaurth Han- 
sen 

9739 Type Ic, a novel glycogenosis. Underlying mecha- 
nism. 

Robert C Nordlie, KatherineA. Sukalski, Juan M. Muhot, 
and Jerry J. Baldwin 



9649 



9657 



9662 



9669 



9676 



9681 



9690 



9700 



9706 



9710 



9717 



9724 



9729 



9733 



• The CONTENTS arranged by Subject Categories will be found immediately following these CONTENTS 
Pull Instructions to Authors will be found in The Journal, 258, 1 (1933), and reprinta may be obtained fit 



9956 1 



> 9861 i 
1 
f 



9378 



from the editorial office. 



01/24/2002 14:14 FAX 



TmR Journal of Biological Chemistry 

Vol. 2{i$, No. 16. Jituc of August 25, pp. 1022S-1O135 196,7 

Printed in U-5,A. 



-» PATENT&TRADEMARK 0005/012 



Analysis of the 3' Endj of the Human Pro-a2(I) Collagen Gene 

UTILIZATION OF MULTIPLE POItfADENYLATlON SITES IN CULTURED FIBROBLASTS* 

I (Received for publication, January 21, 1983) 

Jeanne C. Myers*, Leon |A. Dickson§, Wouter J. de W«t$l, Michael P. Bernardll, Mon-Li Chut 
Maur^io Di Liberto§, Cuglielmina Pope||, Frank O. Sangiorgi*. and Francesco RarnlezVj 

KuZ the ST r ^!! ta , ef *fT^n ^ i0bstetrics "»* Gynecology. University of Medicine and Denting of New Jenry 

» l Depart ™ nC °f Biochemhoy. University of Medicine and Dentistry of New Jerley. NewJ^L ' 

School of Osteopathic Medicine, Piscaiaway, New Jersey 08&54 ^ w^ey 



Three overlapping genomic clones covering 28 kilo- 
bases of the human pro-a2(I) collagen gene have been 
isolated from a X phage library. The analysis of 12 
introns and 12 exons in the 3' endi region has shown 
that the human gene has a structure remarkably simi- 
lar to that reported for the homologous chicken gene. 
One large intron, in the a- chain domain, contains an 
Alul sequence flanked by short direct repeats; a second 
Alul sequence is present 4 kilobases downstream from 
the termination codon. The analysis of the exon coding 
for the 3 '-untranslated region has | revealed that the 
pro-cr2(I) collagen gene transcribes at least four differ- 
ent mRNAs in cultured fibroblasts. The colinearity and 
exact location of the termini of these transcripts was 
determined by Northern blots, R- looping analysis, SI 
protection, and DNA sequencing. The ends of two tran- 
scripts are closely preceded by the canonical polyaden- 
ylation signal (AAUAAA), whereas two of its varia- 
tions (AUUAAA and AUUAA) precede the ends of the 
other two transcripts. 



The structural integrity of most organs and tissues depends 
on the harmonious expression of a complex battery of genes, 
including the multigene family encoding the different colla- 
gens. The develop mentally regulated expression of the colla- 
gen genes results in the synthesis of at least nine different 
products which are subjected to a complex array of post- 
translational modifications and extracellular processing to 
produce the five different types of mature collagens known in 
vertebrates. The native proteins (procollagens) consist of 
three identical or similar pro-a-chains each with an NH 2 - 
terminal propeptide, a COOH- terminal propeptide, and a 
central triple helical a-chain domain with 1 a repetitive tripep- 
tide atructure (Gly, X YW In the fibrillar collagens (types 
I— 111), specific amino- and carboxyendopeptidases cleave ex- 
tracellularly the propeptide segments before the mature pro- 
teins undergo the process of fiber formation (1). Alterations 
in the structure, synthesis, or processing of these proteins in 
man may result in a number of inherited disorders, such as 
osteogenesis imperfecta, chondrodystrophy, Marfan syn- 
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drome, and Ehiers-Danlos syndrome (2). One of our primary 
goals was to isolate and analyze, in detail, the structure of the 
human type I procollagen genes, in order to begin to under- 
stand those factors involved in their complex coordinated 
expression in normal and diseased tissues. 

Type I procollagen, the most abundant of the five different 
collagens identified in higher vertebrates, is a major compo^ 
nent of skin, tendons, and bones. This heterotrimer consists 
of two identical pro-al (I) chains and onepro-o:2(I) chain, and 
is, therefore, the product of two coordinated expressed genes 
(1), These two genes have been recently assigned to chromo- 
some 7 (pro-a2(I)) and chromosome 17 (pro-al(I)) (3, 4) using 
cloned cDNAs specific for these two chains (5, 6). Sequencing 
of the pro-a2(I) cDNA clones has allowed, for the first time, 
the determination of the primary structure of more than half 
of the human pro-ot2(I) chain (7). The comparison of the 
human sequences with the previously published data on the 
homologous avian chain (8, 9) has made possible the exami- 
nation Of the evolution of this gene in two species which have 
diverged more than 300 million years ago (10), 

Here, we report the isolation of three overlapping genomic 
clones covering 28 kb J of the human pro-*2(I) collagen gene, 
from its 3' end to amino acid 19 in the helical portion of the 
a2(l) chain. The human pro^2(I) gene exhibits a complexity 
of intron-exon organization analogous to collagen genes of 
other vertebrates, particularly in the size and distribution of 
the four exons coding for the COOH-propeptide (9, 11-14). 2 * 3 
Two repeated sequences, members of the Alul family, are 
associated with the human pro- a 2(I) collagen gene. The first 
Alul sequence (a2Rl) is present in a large intron between 
amino acid residues 765-766 in the «-chain. The second Alal 
sequence («2R2) is located in the 3 '-flanking region, 4 kb ' 
downstream from the termination codon. Both repeats are 
preceded by a 5' poly(T) stretch and are flanked by a number 
of short direct repeats. 

The detailed characterization of the first exon of the human 
pn>a2(I) gene has revealed the presence of multiple tran- 
scripts in cultured fibroblasts. We were able to characterize 
at least four different transcripts, varying in the length of • 
their 3'-untranslated regions. Two of these utilize the canon- 
ical polyadenylation signal (AAUAAA), whereas the other 
two utilize two variations of it (AUUAAA and AUUAA). 
Multiple transcripts with similar characteristics have already 
been described for other eukaryotic genes (15-19) and in one 
case they have been correlated to tissue specificity (20). ( 



1 The abbreviations used are: kb, kilobases; bp, base pairs. 
8 M. P. Bernard, M. L. Chu, J. C. Myers, F. Ramirez, K. Eikenberry. 
and D. J. Prockop, manuscript in preparation. 
3 Y. Yamada, personal communication. 
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Analysis of the 3 ' End of the I 

MATERIALS AND METHODS 

Enzymes and Isotopes — Restriction cndonucleases and other nu- 
cleic acid-modifying enzymes were purchased from Bethesda Re- 
search Laboratories and New England Biolabs (Beverly, MA) and 
used according to the manufacturer's specifications. Si nuclease was 
purchased from Miles Laboratories Inc. (Elkhart, IN), ultrapure 
formamide from Fluka. Labeled isotopes were purchased from New 
England Nuclear and Amersham Corp., nitrocellulose filters from 
Schleicher & Schuell. 

Screening of Genomic Library — The genomic library in Charon 4 A 
used in these studies was obtained from Dr. T. Maniatis (Harvard 
University) and contained 15-20-kb inserts of human nuclear DNA 
partially digested with the enzymtsAlul and Haelil (21). The screen- 
ing for the pro-u2(I) collagen clones, Lheir isolation, amplification, 
and DNA purification were performed a3 described (22). AH the 
experiments were conducted using the appropriate level of biological 
and physical containment as detailed in the National Institutes of 
Health guidelines for recombinant DNA research. 

RNA Isolation and DNA Sequencing — Total poly (A"*") RNA was 
purified from cultured human fibroblasts as (previously described (23). 
The nucleotide sequences of the appropriate DNA fragments werB 
carried out according to the chemical modification procedure of 
Maxam and Gilbert (24). Sequencing of both strands was performed 
for most of the regions detailed in this paper. 

Hybridization to RNA and DNA Immobilized onto Nitrocellulose— 
Total poly(A + ) RNA was electrophoresed in 0.7% agarose gels in 2 M 
formaldehyde and transferred at 4 0 C by blotting onto nitrocellulose 
paper (25). Restricted DNA in agarose gels was denatured and neu- 
tralized in situ, and transferred onto nitrocellulose paper using the 
techniques of Southern (26), Nucleic acid filter-bound hybridizations 
were performed as previously described (5, 6). 

Electron Microscopy— R- looping wag carried out according to the 
protocol of K aback et ai (27), and hctcroduplexing according to the 
method described by Davis et al- (23). DNA molecules were visualixed 
and photographed with a JEOL electron microscope and measured at 
a final magnification of 45,000 with a Hewlett-Packard 9S10 calcu- 
lator equipped with a 9864A digitizer. Double -stranded replicative 
form of phage <^X174 DNA was included for length calibration. 

S2 Nuzkase Protection'- End- labeled genomic fragments were used 
for the SI protection experiments using the protocol described by 
Berk and Sharp (29). The 3' end labeling was carried out by the 
addition of [a-^Pjdeoxynucleotidep (specific activity: 1,000 Ci/rnM) 
using the Klenow fragment of DNA polymerase I- The labeled frag- 
ments were he at- denatured in 30% dimethyl sulfoxide, strand-sepa- 
rated on poly aery lamide gels, clcctroeluted, and annealed to total 
fibroblast poly{A A ) RNA prior to Si digestion. The exact sizes of the 
Sl-resi3tant products were determined by electrophoresis on a 5% 
sequencing gel (80 cm) in parallel with DNA fragments which were 
5' end-labeled and subjected to the Maxam and Gilbert (24) chemical 
modification reactions. 

RESULTS AND DISCUSSION 

Gene Isolation— The pro-cr2(I) cDNA clones, Hf-32, and 
Hf-1131 (5, 7) were used for the initial screening of the 
genomic library. A positive clone (NJ-1), 16.8 kb in length, 
was isolated and appropriate subclones were subsequently 
used for the isolation of 5' end (NJ-3) and 3' end (NJ-6) 
overlapping genomic clones. The three clones (40 kb in total 
length) were extensively characterized by restriction endo- 
nuclease mapping and Southern blot hybridization with dif- 
ferent subfragrnents of Hf-32 and Hf-1131 in order to define 
their sequential orientation. The continuity of the overlapping 
genomic regions was confirmed by Southern blot analysis of 
nuclear DNA digested with various restriction enzymes. The 
presence of repeated sequences associated with the pro-a2(l) 
gene was determined by Southern blot hybridization of the 
three clones with "nick- translated" total human DNA. Elec- 
tron microscopy studies and DNA sequencing were performed 
for the portions of the gene which are the subject of the 
investigations presented here. 

A composite restriction map of the human pro-a2(I) colla- 
gen gene with its relationship to the different domains of the 
protein is depicted in Fig. 1. The clones span from amino acid 
19 in the ft-chain to the 3'-flanking region. They cover 28 kb 
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of the pro-or2(I) collagen gene and contain almost 4 kb of 
coding sequences. This ratio of interdispersion with noncod- 
ing sequences is almost identical with that found by Wozney 
et al. (13) for the pro-a»2(I) chicken gene. W*?, therefore, can 
safely extrapolate that the size of the entire human pro-c*2(I) 
collagen gene should not significantly vary from the 38-kb 
value reported for the avian gene. 

Analysis of the 3' End of the Gene: Exon-Intron Arrange- 
ment — The complexity of the chicken pro-a2(I) collagen gene 
has been documented by a series of elegant investigations 
which have shown that this coding unit is greatly interdis- 
persed by almost 50, often very large, introns resulting in a 
gene exceeding at least eight times the size of its mature 
transcript (for a review see Tate et al (30)). The function of 
introns in eukaryotic genes is still a subject of speculation; 
one of the theories favors the idea that they separate exons 
encoding different functional or conformational segments 
within the same protein (31). This theory has an evolutionary 
significance, because it implies that complex proteins can 
evolve different functional domains independently of each 
other. Moreover, a particular function could be developed 
only once during evolution and then, through recombination 
and rearrangement of blocks of DNA, be dispersed among 
different genes. In line with this idea, Wozney et ai (13) have 
suggested that the four exons encoding the chicken pro-ff2(J) 
collagen CO OH -propeptide represent four different functional 
domains of this portion of the protein. The Bame group has 
also observed the absence of introns in the junction , regions 
between the terminal propeptides and the helical portion of 
the o:-chain. They have concluded that the junction exons ' 
represent evolutionary Stable domains of the gene, because 
they encode for the endopeptidase cleavage sites of the fibril- 
lar collagens. 

The isolation of the human pro-a2(l) gene has now allowed 
us to compare these features in the mammalian gene. A map 
of the intron-exon arrangement in the 9 kb of the gene 
extending from residue 765 in the a-chain to the end of the 
3 '-untranslated region was determined by electron micros- 
copy (Fig. 1). The approximate sizes of the introns and exons. 
as determined by electron microscopy > are summarized in 
Table L A more detailed analysis of some sections of impor- 
tance was obtained by DNA sequencing (Figs. 2 and 6). Twelve 
exons and 12 introns are jpresent in this region and show a 
size and distribution remarkably similar to that reported for 
the chicken pro-a:2(I) gene (30). 

The first four exons (759 bp) encode primarily for the 
CO OH -terminal propeptide and are interrupted by 2 kb of 
noncoding sequences distributed between three introns. Exon 

1 codes for the last 48 amino acids of the COOH-terminal 
propeptide and contains the entire 3 '-untranslated region. 
The complete sequence of this exon was determined (Fig. 6) 
and will be discussed in greater detail in a later section. Exon 

2 contains the carbohydrate attachment site in a region which 
is highly conserved in the human 2 and chicken pro-al(I) and 
pro-a2(I) genes (7, 8) as well as the avian pro-al(lll) 3 gene. 
Exon 3 contains the tricysteine cluster. Exon 4, the junction 
exon, codes for the end of the triple helical domain, the 
telopeptide, and the beginning of the COOH-propeptide. The 
remainder of the 12 exons shown in Fig- 1 code for the COOH- 
terminal 264 amino acid residues of the a -chain domain. 
These eight exons are small and are interrupted by seven 
introns, ranging between 100 and 1000 bp in size (Table I). A 
distinct pattern in the distribution of small and large exons 
in this region is evident from the data presented in Table I. 
This pattern closely resembles the arrangement of 54- and 
108-bp exons in the same region of the chicken pro-«2(I) gene 
more accurately determined by direct DNA sequencing (30). 
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Fig. 1. Restriction map of part of the human pro-cr2(I) collagen gene, its relationship to the different 
domains of the polypeptide chain, and the intron-exon arrangement of the 3' region of the gene. The 
upper half shows 28 kb of gene {cross-hatched box) spanning from amino acid residue 19 in the <*-chain to its 3' 
end (position 0 in the scale below). The white box indicates the 3 '-untranslated region. Depicted are also 5 Jtb of 
the 3' -flanking sequences which include the AluJ repeat, «2R2; The other repeat, a2Rj, i« located within the gene' 
8.5 kb from its 3' end. The letters represent all sites of the following restriction enzymes within the analyzed 28 
kb of genomic DNA: B T BamHI; E s £caRI; H, Hindlll; X t Xbal; Xh, XhoL The EcoJLl site, located 25 kb from the 
3' end of the gene and indicated in parentheses, was found to be a polymorphic site in several individuals. The 
overlaps of the three genomic clones NJ-1, NJ-3, and NJ-6 arc depicted underneath the restriction map (broken 
lines). NJ-6 contains an additional 7 kb of flanking sequences not shown in the figure. The Inwer half shows an 
expanded representation of the intron-exon arrangement of the gene from amino acid residue 765 in the a-chain 
to the 3' end of the gene. The exons (black boxen) and the introns (white boxes) are sequentially numbered from 
the 3' end with Arabic and Raman numeral respectively. The locations of the termination codon (TAA) r Lhe 
carboxy endopeptidase cleavage site, and the cr2Rl repeat are indicated in exons 1 and 4 and intron XII. 



Table I 

Exon-intron arrangement of the 3' region of the human prv-a2(J) 
collagen gene 



Ex on/Intro n 


Size in bp 


Exon 1 


992"' fe 


Intron I 


770' 


Exon 2 


243' 


Intron II 


300 c • 


Exon 3 


192 ± 30" 


Intron III 


4S2 ± 123 rf 


Exon 4 


286 ± 30* 


Intron IV 


393 ± 43 J 


Exon 5 


118 ±22* 


Intron V 


143 ± 23 d 


Exon 6 


74 ±9* 


Intron VI 


600 ± 85" 


Exon 7 


ll9±16 tf 


Intron VII 


471 ±W 


Exon 8 


69 ± 10 rf • 


Intron VIII 


132 ± 26' 


Exon 9 


113 ±1V 


Intron IX 


98 ± 22< 


Exon 10 


72±8 d 


Intron X 


467 ± 19 d 


Exon 11 


132 ± 19" 


Intron XI 


747 e 


Eicon 12 


)l3 e 


Intron XII 


1052° 



0 The sizes were determined by direct DNA sequencing. 
6 Exon I contains 144 bp of coding sequences (Fig. 6). 
e The sizee were determined by electron microscopy on one mole- 
cule. 

ri The sizes were determined by electron microscopy on at least 15 



In this context, it must be noted that the data in Table I 
reflects the real sizes of the exons only approximately. Our 
preliminary sequence data showed that exons 6, 8, and 10 are 
indeed 54 bp and that the size of exons 5, 7, 9, 11 7 and 12 is 
108 bp.* Although numerous differences have been found at 
the nucleotide level between the human and chicken pro-o:2(I) 
collagen cDNAs (7-9), our data indicate that the structure of 
the 3' end of the gene is almost identical. The similarity is 
evident both at the level of the overall intron-exon arrange- 
ment and. in the distribution of the four coding segments 
within the COOH-propeptide region. It is interesting to note 
that both the chicken pro-orl(III) 3 and the human pro-aid) 5 
collagen genes have the same number of exons coding for the 
COOH-propeptide. Therefore, at least for the COOH-propep- 
tide region, these observations seem to favor the functional 
domain hypothesis. However, this hypothesis does not explain 
the separation of the a-chain region into 40 exons, even by 
postulating the presence of clusters of functional subdomains 
(32). The detailed analysis of this particular section of the 
collagen gene and the molecular characterization of the defect 
in those patients,, where structural abnormalities of the a- 
chain are due to either insertions or deletions (33-35), may 
in the future help answer some of these questions. 

The analysis of the 40 kb of genomic DNA covered by the 
three overlapping clones has also revealed the presence of two 
short repeated sequences, which have been mapped by South- 

* M. Di Liberto, V. Benson, R. Shemesh, T. Mariano, and L. A. 
Dickson, manuscript in preparation. 

& M. U Chu, W. de Wet, M. P. Bernard, M. Morabito, J. C. Myers, 
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FiO- 2- The nucleotide sequence 
of the Alul repeats associated with 
the human pro-tr2(D collagen gene. 
Top, the nucleotide sequence of the Alul 
repeated sequence, cr2Rl, located in in- 
tron XII. Flanking c?2Rl are four direct 
repeat* which are underlined and iden- 
tified with Roman numerals- Bottom, the 
nucleotide sequence of the Alul repeated 
sequence, ac2R2, located 4 kb down- 
stream from the termination codon. 
Flanking a2R2 are five direct repeats 
which are underlined and identified with 
Roman numerals. 
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C AT ATftCTC ATTTT ACTC ATTTCCTT TTTGTTTTTG TT TTTTCl'f'rTTG TTTTTG TTTTTTTC TTTTCGC ftTC G ACTTC C rrT* fPf.ff. rr * 
C/VTCCTCCACTT.CACTGGCACrGArCTCAGCTCACTGCGACTTCCCCCTCCCCACTTCAAGTA^ 

TGGG ACT AC AGG CAC AC A C CATC ATT CCTAGCT AAT TTT CGTA'lTTT CAGTAA AG A CC CCC TTTCAC C AT ATTCG TCAGG CTGGTCTCGA 

IV 

TCTCCTGACCTCAGGtGATCCACCACCTTCACCTCCTA^^ 



I II III 

GACTTTC T GAACTATGACTTTCT C A rTllTlTTT TT TTrin - l^ TTTTTTTTT T TCATCACTTATACTCTTCTTCCA f AnnrTT.r.fl^Tfir A 
GTGGTCCGATCTTCCATCACTACAGCTTGAACTCCCAGGTC 

G CACCACCATG CCC ACCTAATCTC at ATTTTT AGT AG AGACAGGTTGTG CCACGTrGGTCAGGCTCGTCf CCAACfCCTGACCTCAGGTG 

IV V 

ATCCGCCCTTCT^GGCCTCCCAAAGTCCTCCCATTACACCTCTCACCCACTC:CACCCGGCT^CAACTATCACTTTCA 



era blotting hybridization with respect to the pro-«2(I) col- 
lagen gene (Fig. 1). Sequencing experiments have determined 
that they are two different members of the Alul family, which 
is found randomly interdispersed throughout the human ge- 
nome (36). Several theories have been suggested to explain 
the function of these elements and the mechanisms behind 
their ubiquitous dispersal in the mammalian genome. One 
line of thought has correlated the presence of short direct 
repeats flanking the Alul to the direct repeats flanking bac- 
terial transposable elements (37). 

One of the pro-a2(I) Alul repeats (a2Rl) is contained in 
intron XII between residues 765 and 766 in the a -chain, 315 
nucleotides from the 5' end of the intron and 430 nucleotides 
from the 3' end (Fig. 1). The central Alul consensus region 
of cr2Rl 7 281 bp long, is preceded at the 5' end by a long 
stretch of poly(T), and it is flanked at both ends by short 
direct repeats (Fig. 2, top) (36). Three direct repeats (I -III) 
are present at the 5' end and one (VI) is present at the 3' 
end. Only direct repeats III and TV are identical. Direct repeat 
II differs from III by the insertion of a Thd and has a Cyd to 
Ado change, while direct repeat I differs from III by a Thd to 
Ado and a Cyd to Ado change. 

The other pro-«2(I) Alul repeat (a2R2) is located in the 
flanking sequences 4 kb downstream from the termination 
codon. As for a2Rl, <*2R2 is an inverted Alul repeat with an 
30% homology to the Alul consensus sequence and varies 
from a2Rl for both the interna) segment and for the number 
of the flanking short direct repeats. Three direct repeats are 
present at the 5' end (I-III) and two at the 3' end (IV and V) 
(Fig. 2, bottom). These direct repeats are seven to nine nu- 
cleotides long and only two direct repeats, II and IV, are 
identical Direct repeat III differs from I only by the presence 
of an extra Cyd at its 3 f end. In addition, the seven nucleotides 
of direct repeat II plus the first seven nucleotides of direct 
repeat III are identical with the corresponding first 14 nucleo- 
tides of direct repeats IV and V. 

At the present time, one cannot draw any conclusions about 
the possible role that these A Jul sequences might have played 
in the evolution of this gene; however, it is interesting to note 
that our studies show that two different Alul members are 
also closely associated with the human pro-al(I) collagen 
gene, fi which resides on a different chromosome (4). Unlike 
the Alul sequences described here, the Alul consensus se- 
quence of the pro-al(I) repeats are flanked at the 3' end by a 
long poly(A) stretch (36). As for or2R2, one of the pro-al(I) 
Alul repeats is located 4 kb downstream from the termination 
codon, whereas the other repeat is contained within an intron 
of the a-chain between amino acid residues 411-412. Finally, 
the two AJul repeats associated with each gene are spaced at 
an almost identical distance of 12 kb apart. One could spec- 
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Fig. 3. Northern blot hybridizations of total human 
polyCA*) RNA from cultured fibroblasts with various genomic 
probes. Upper half, restriction map of a portion of the 3' end of the 
pro-a2(I) collagen gene including part of the flanking sequence*. 
Exons 1 and 2 arc shown as dark boxex; introns I and II are shown as 
white boxes. The locations of the termination codon (TAA) and the 
«2R2 repeat {cros^hatched box) are provided as points of reference 
to the rnap shown in Fig. 1. The letters represent the sites of different 
restriction enzymea: B, BamUl; E, EcoRl, H, Hincll. Lower half, 
Northern blot hybridizations of various genomic fragments with 0.5 
Mff of total polyfA 1 ") RNA. Lane 1, the probe used was the 1.5-kb 
genomic fragment £coRI:£coRl {E:E). Lane 2, the probe used was 
the 1.8-kb genomic fragment EcokhBamHl (E;B)~ Lane 3, the probe 
used was the 1.5-kb genomic fragment Hincll.BamUl (H:B). Lane 4, 
the probe used was the 3-6-kb genomic fragment BamHhBamHl 
(B:B), which unlike other reports (46), under our experimental con- 
ditions, did not show any hybridization pattern to a mature pro-<*2(I) 
collagen mRNA species even after long exposure. Lanes 5 and 6, 
hybridization with cDNA probea specific for the pro-*2(I) chain (Hf- 
32) and the pro-crl(I) chain (Hf-677). The size of the different RNA 
species was determined by running RNA markers in a parallel slot 
and visualizing them with ethidium bromide staining. The RNA 
markera used were: 35 S polio virus RNA (7.6 kb). 23 S (5.1 kb) and 
18 S (2.15 kb) chicken fibroblast rRNA, and 23 S (3.3 kb) and 16 S 
(1.54 kb) Escherichia coli rRNA. 

ulate that the difference in the relative position of the intron ic 
A Jul repeats in the two genes may be due to the greater 
compactness of the human pro-«l(I) collagen gene which is 
only 18 kb in size. 5 

Polymorphic mRNAs—Vfe have reported that when total 
poly(A + ) RNA and polysomal RNA from normal human fi- 
broblasts is blotted onto nitrocellulose paper and hybridized 
to the pro-orl(I) and pro-«2{I) cDNA clones, multiple bands 
of almost equal intensity are observed with each probe (Fig. 
3) (5, 6). This finding indicates that both type I collagen genes 
transcribe more than one mRNA. We have found that this 
phenomenon is also true for mouse and chicken fibroblasts 
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(data not shown). The presence of polymorphic mRNAs due 
to either length heterogeneity or alternative splicing has been 
already described in several other eukaryotic genes (15-19). 
These transcriptional variations in the mouse a-amylase gene 
are related to tissue specificity (20) whereas, in the ju-immu- 
noglobulin system and the calcitonin gene, they appear to 
cause the expression of functionally diverse products (38, 39). 
Quantitative and qualitative differences in the types and 
proportions of the different collagen proteins in various nor- 
mal and abnormal tissues have been reported (1). In order to 
ultimately assign a biological and functional role to these 
mRNAs. we have addressed the basic Question of the exact 
Dumber and structural characteristics of the pro-«2(i) colla- 
gen gene transcripts in human fibroblasts. 

First, we established the transition of the different RNAs 
at the 3' end of the gene by Northern blot hybridizations with 
the appropriate genomic subclones. Second, we confirmed the 
colinearity of these transcripts by R-looping experiments 
between the genomic subclones and the collagen mRNAs 
Third, we defined the exact location of the termini of the 
transcripts by Si nuclease protection experiments in conjunc- 
tion with DNA sequencing. The estimated size, of the three 
major mRNA bands observed by Northern blot hybridization 
with the pro-a2(I) cDNA probe are 6-2, 5.7 ; and 5.5 kb, 
respectively (Fig. 3). This pattern was consistently seen in a 
number of human fibroblast cell lines. However, the 5.5-kb 
band, which represents only 5-10% of the hybridizing RNA, 
was not detectable by Northern blot hybridization, R-looping 
analysis, or Si nuclease protection experiments using genomic 
probes specific for the 3 '-untranslated region. At this time, 
we do not have any conclusive explanation for the nature of 
this RNA species and for the location of its 3' terminus. 
Currently, cDNA cloning experiments, aimed to isolate and 
directly characterize this particular transcript by DNA se- 
quencing, are in progress. 

The 5.7-kb mRNAs— The experiments summarized in Fig. 
3 clearly show that the transition between the 5.7- and 6.2- 
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kb mRNAs is located in the 3 '-untranslated region of the pro^ 
2(1) gene, more precisely in exon 1 around the Hincll site. 
This observation was visually confirmed by R-looping exper- 
iments using the genomic EcoRhBamHl fragment subcloned 
in pBR322. Ten of the 41 R-loops analyzed showed a 
DNA:RNA hybrid of 384 ± 73 nucleotides (Fig, 4A)_ To 
determine the exact terminus of the 5.7-kb mRNA, Si nu- 
clease protection experiments were performed using as probe 
the EcoRlAvail {E:A) genomic fragment which extends 194 
nucleotides beyond the Hincll site (Fig. 5). This 487-nucleo^ 
tide fragment was 3' end-labeled : strand-separated, hybrid- 
ized to total fibroblast poly(A*) RNA, and subjected to Si 
digestion, and the product of the reaction was run on a 
polyacrylamide gel. A major Si -resistant product was seen as 
a close triplet of bands 345, 333, and 330 nucleotides long, 
designated in Fig. 5 as an average value of 340. This result 
placed the polyadenylation attachment site of the 5.7-kb 
mRNA within 20 nucleotides from the canonical AAUAAA 
signals (Fig. 6) and in accordance with the 384 ± 73 observed 
R-Ioop. The end of the 5.7-kb mRNA transcript is shown in 
Fig. 5 between position 307 and 285 from the termination 
codon. The 437-nucleotide-resistant product was the result of 
complete protection of the EcdRVJLuall fragment by the 6.2- 
kb mRNA, which proved once more the colinearity of the 
major transcripts. Furthermore, after longer exposure, a mi- 
nor Si -resistant species 250 nucleotides long was seen, ac- 
counting for less than 5% of the total protected material. The 
pentanucleotide AUUAA, a shorter variation of the canonical 
signal, closely precedes the end of this mRNA (Fig. 6). The 
fact that this minor transcript had not been seen by electron 
microscopy and Northern blots was probably due to its low 
representation and to the small difference in length with 
respect to the major 5.7-kb mRNA. We excluded that this 
minor Si -resistant species represents the 5.5-kb mRNA by 
performing Northern blot hybridizations with a 290-bp 
EcoKLiHincll genomic fragment (Figs. 5 and 6). This short 
genomic fragment covers 250 bp specific for the minor Sl- 



Fig. 4. R-looping analysis of the 
3' end of the pro-<*2(I) collagen 
gene. The 1-8-kb genomic subclone 
EcoRhBamHl (which includes part of 
exon 1 and the 3 '-flanking region; see 
mapfl in Figs. 1 and 3) was subcloned in 
pBR322, and linearized with the enzyme 
Puull prior to the R-looping hybridiza- 
tion. Three different types of DNA.RNA 
hybrid molecule.? were seen using the 
same genomic fragment, and they are 
shown in A, B, and C. The interpretive 
tracing of the three micrographs at the 
left are shown on the ri$ht> with the 
indication of the sizes of the R-loops. 
Solid lines, DNA; broken lines, RNA. The 
si2es of the three R-loops were deter- 
mined using as standard the double- 
stranded replicative form of phage 
0X174 DNA. The bar In the lower left 
corne.r of the pictures is a length Stand* 
ard of 1.0 kb. 




01/24/2002 14:13 FAX 



PATENT&TRADEMARK @]010/012 



Analysis of the 3 ' End of the Human Pro-a2(J) Collagen Gene 



i 



E' H 290 A" 7 



TAA 38 









^1020 



<E=A) 



497n (A-A) 
533n 

340n 396n 



2 3 4 



5' 

— i 



1013S 



B 



1800 



0.1 Kb 



2 3 4 




487 




3- 340 



•-■■r 



250 



20 



(E A) {A:A) 

Fig. 5. Si nuclease mapping of the 3' termini of human 
fibroblast pro~er2(I) collagen mRNAs. Upper half, restriction map 
of the 3' end of the gene and part of | its flanking sequences. The 
letters designate the sites of the different restriction enzymes: A, 
AvaW\ B f BamHI; E t EcoRl; H t HincIL. The superscripts indicate the 
nucleotide distance of the various cleavjage sites from that of BcoTil 
(designated as 1). The termination codon {TAA) is shown at the end 
of the coding sequence (white box). The \alid lines represent Lhe DNA 
fragments used in these experiments; the broken lines represent the 
resistant products obtained after Si digestion. The asterisks indicate 
the 3 ' end labeling of the molecules. The two fragments used were 
EcoRfcAuaJJ (E'A\ 4B7 nucleotides long, Bnd AualliAuczII (A: A), 533 
nucleotides long. Both fragments were|3' end~labeled, strand-sepa* 
rated, and electroeluted. Constant amounts of the labeled antistrands 
(1 ng) were hybridized to increasing concentrations of total poly(A + ) 
RNA under R-loop conditions to minimize self-reannealing. The 
hybrid* were then subjected to Si nuclease digestion as described by 
Berk and Sharp (29). Lower half, the left side shows the autoradiogram 
of the Si nuclease digestion using the|4S7-nucleotide EcoRlAvall 
{E:A) fragment. Lane I, labeled DNA wjith no RNA and Sl-treated; 
Lane 2, labeled DNA hybridized to 3 fig of total poly(A + ) KNA and 
treated with Si; Lane labeled DNA j hybridized to 1 pg of total 
poly(A + ) RNA and treated with Si; lane 4, labeled DNA with no 
RNA and no Si treatment. The right side ehows the auto radiogram 
of lhe SI nuclease digestion using the 533-nucleotide Aualli/tuoII 
(A:A) fracment. Lane 7, labeled DNA^ with no RNA and no Si 
treatment; Lane 2, labeled DNA with no RNA and treated with SI; 
Lane 3, labeled DNA hybridized to 1 mR of to tal poly(A T ) RNA and 
treated with SI; Lane 4 t labeled DNA hybridized to 10 pg of total 
poly(A + ) RNA and treated with Si. Tjhe Aizes of the Si -resistant 
products are indicated as averaged values. The Avall'Avall fragment 
contained a small, fast movinp contaminant (ri^hr, I^ane I) which did 
noi interfere with the SI assay results. . 



resistant product and therefore is a more sensitive probe for 
hybridizing to the 5.5-kb RNA. In these experiments, the 
EcoRWincll probe detected only the 6.2- and 5.7-kb bands 
even under less stringent conditions of hybridization (data 
not shown). 

The 6.2-kb mRNAs—Uhe pattern obtained by Northern 
blot hybridization with the different 3' end genomic frag- 
ments (Fig. 3) indicates that the end of the 6.2-kb band is 
located within the HinclhBamUl region. Of the 41 DNA: RNA 
hybrids analyzed, 20 showed a hybrid 866 ± 68 nucleotides 
long and 11 showed a DNA:RNA molecule 648 ± 53 nucleo- 
tides long (Fig. 4, B and C). This finding suggested that the 
6.2-kb band seen by Northern blot hybridization (Fig. 3) was 
indeed a composite of two mRNA species with different 3' 
termini. The end of these two transcripts was determined by 
Si nuclease protection experiments using the 5 33 -nucleotide 
■ ,4uaII:At>aII fragment immediately adjacent to the 487 nucleo- 
tide EcoRVAuall fragment (Pig. 5). Two Si -resistant bands 
of almost equal intensity were seen: one, 201 nucleotides long, 
the other, a doublet averaging 396 nucleotides. The latter 
would place the end of this transcript around position 845 
from the termination codon (Fig. 6), well in agreement with 
the presence in that area of the AAUAAA signals and in 
accordance with the 866 ± 68 R-loop. The former would place 
the end of the other transcript at position 650 from the 
termination codon or 20 nucleotides from an AUUAAA signal 
(Fig. 6), explaining the mRNA species seen as a 648 ± 53 R- 
loop. 

From these data, we conclude that the 6.2-kb band seen by 
Northern blots is actually two co-migrating RNAs varying 
200 nucleotides in the length of their 3 '-untranslated region. 
One of the possible explanations is that the size difference at 
the 3' end is compensated by a differential length of their 5'- 
untranslated regions. The isolation of the 5' end of the pro- 
cc2(I) collagen gene will allow us to test this hypothesis and 
to prove if these differences are due to either differential 
initiation or splicing. 

Polyadenylation Sites — The role of the hexanucleotide 
AAUAAA in eukaryotic genes as a signal sequence preceding 
the recognition site for polyadenylation and/or polymerase II 
termination has been well established (40). However, the 
analysis of numerous cloned genes has brought new insights 
in the complexity of the factors involved in eukaryotic mRNA 
termination (41). In the 845 nucleotides of the 3 '-untranslated 
region of the human pro-a2(I) collagen gene, we have found 
several potential signals for polyadenylation (Fig. 6), but only 
four of them are clearly utilized by the mature fibroblast 
transcripts. Moreover, two of these signals appear to be vari- 
ations of the canonical sequence. The first, AUUAAA, has 
been reported to be a functional site for the mouse pancreatic 
ff-amylase mRNA (42). The second, AUUAA, is a shorter 
version of this canonical variation. It is interesting to note 
that the 750-nucleotide dihydrofolate reductase mRNA also 
utilizes a shorter version (AUAA) of the canonical signal 
(AAUAAA) (16, 43). We have not yot tested if the expression 
and modulation of these transcripts varies under different 
conditions of cell culture or in tissues differentially expressing 
type I procollagen- In any event, it appears clear that the 
maturation of the pro-<*2(I) collagen mRNA, besides the 
intrinsic complexity of the numerous splicing events, is fur- 
ther complicated by the generation of different size products. 
Our data strongly suggest that the differences are primarily 
due to the length of 3 '-untranslated region. It could be argued 
that these multiple transcripts represent monogenic products 
of three or more pro-a2(I) collagen genes. This is a most 
unlikely explanation because biochemical, genetic, and molec- 
ular evidence has strondv suezested the presence of only one 
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Fig. 6. Nucleotide sequence of 
exon 1 of tbe human pro-a2(I) col- 
lagen gene and its immediate 3'- 
flankinf* region. The last 48 amino 
acid codons of the COOH-terminal pro- 
peptide are numbered 200-247. The nu- 
cleotides of the 3 '-Untranslated region of 
exon 1 and the adjacent 3 '-flanking re- 
gions arc numbered 1-983. Some of the 
restriction enzyme sites are indicated in 
reference to the maps shown in Figs. 1, 
3, and 5. Polyadenylation signals utilized 
by the fibroblast transcripts are boxed; 
other potential signals are underlined. 
The bracket beneath the nucleotide se- 
quences indicates the approximate ter- 
mination points of the four pro-<*2(i) 
collagen mRNAs. 
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copy of the proo*2(I) gene in the human riaploid complement 
(34, 44, 45). Formally, it is also possible that although the 
pro-Q'2(I) gene is present in a single cop^, its 3' end may be 
shared by other genes, generating similar size transcripts. 
However, Southern blotting analysis of nuclear DNAs from 
different individuals digested with various (restriction enzymes 
and hybridized to the EcoRhBamI genomic subclone clearly 
showed a unique pattern of single copy representation (data 
not shown). j 

Finally, it is tempting to speculate that, in some inherited 
or acquired disorders of connective tissue], an altered expres- 
sion of one or more of these transcripts due to mutations at 
any of the control levels may result in a change in the 

production of functional pro-a2(I) mRNA. 

i 
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